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COUNTING TANGLEGRAMS WITH SPECIES 

IRA M. GESSEL* 


Abstract. A tanglegram is a pair of binary trees with the same set of leaves. We use the 
theory of combinatorial species to count unlabeled tanglegrams of various kinds. 


1. Introduction 

A tanglegram is a diagram, used in biology to compare phylogenetic trees, consisting 
of two (usually binary) trees together with a matching of their leaves. Tanglegrams were 
recently counted by Billey, Konvalinka, and Matsen [4] , and we refer to this paper (and their 
related paper [3]) for references to biological applications. We answer here several questions 
raised by Billey, Konvalinka, and Matsen, by giving formulas for counting three variations 
of tanglegrams. 

We dehne a binary tree to be a rooted tree in which every vertex has either zero or two 
children, and in which the leaves (vertices with no children) are labeled but the interior 
vertices are unlabeled. (See Figure 1.) We also consider the tree with only one (labeled) 



Figure 1. A binary tree 

vertex to be a binary tree. The children of an interior vertex are not ordered, so, for example, 
there is one binary tree with label set {1,2}. It is not hard to show that the number of binary 
trees with label set [n] = (1,2,..., n} is 1 ■ 3 ■ ■ ■ (2n — 3) for n > 1 (see, e.g., [12, Example 
5.2.6]). 

We dehne a labeled tanglegram to be an ordered pair of binary trees with the same set of 
leaves. Figure 2 shows a labeled tanglegram with three leaves and Figure 3 shows another 
way of drawing the same tanglegram. Labeled tanglegrams are easy to count: the number 
of labeled tanglegrams with n leaves is (1 ■ 3 ■ ■ ■ (2n — 3))^. 

An unlabeled tanglegram is an isomorphism class of tanglegrams, where two tanglegrams 
are considered to be isomorphic if one can be obtained from the other by permutation of 
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Figure 2. A labeled tanglegram with three leaves 



Figure 3. Another representation of the tanglegram of Figure 2 

the labels. Billey, Konvalinka, and Matsen [4] proved a formula for the number of unlabeled 
tanglegrams with n leaves. They left open the problem of counting several variations of unla¬ 
beled tanglegrams: unordered tanglegrams, unrooted tanglegrams, and unordered unrooted 
tanglegrams. An unordered tanglegram is an unordered pair of binary trees (not necessarily 
distinct) with the same set of leaves and an unrooted tanglegram is an ordered pair of un¬ 
rooted trees with the same set of leaves (vertices of degree one), in which every vertex of 
each tree has degree one or three, and only the leaves are labeled. (See Figure 4.) 



Figure 4. An unrooted tanglegram with hve leaves 


2. Species 

The theory of combinatorial species, initiated by Andre Joyal [8, 9], allows us to construct 
combinatorial objects in ways that enable us to count various types of tanglegrams. We give 
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here a very brief account of part of the theory; we refer the reader to Bergeron, Labelle, and 
Leroux [2] for a comprehensive exposition. A concise introduction to the theory of species 
can be found in [7]. 

A species is a functor from the category of hnite sets with bijections to itself. A species 
F associates to each hnite set A a hnite set 7^[A], called the set of F-structures on A, 
and associates to each bijection of hnite sets cr : A —)■ S a bijection F[a] : F[A] —)■ F[B]. 
In particular, any bijection vr : [n] —>■ [n] yields a bijection F[7r] : F[n] —)■ F[n], where 
F[n] = F[{1, 2,..., n}], so the symmetric group &n acts on the set F[n]. The ©^-orbits 
under this action are called unlabeled F-structures of order n. To any species F we may 
associate its cycle index series Zp, a symmetric function dehned in terms of the power sum 
symmetric functions^ Pi = dehne 

Zp = Zp{pi,p2,...) = X] hxF[aKY (1) 

n>0 ^ cr£©n ^ 


where hxF[cr] = |{s G F[n] : F[ct](s) = s}| and Po- = Pi^P 2 ^ • • •, where Uj is the number of 
i-cycles of a. Since hxF[(T] depends only on the cycle type of cr, the cycle index can also be 
written as 


z, = ^(5:axF|Ai^). 

n>0 '^Ahn 


Here the sum A h u is over all partitions A = (Ai, A 2 ,..., A*,) of n, where Ai > A 2 > ■ ■ ■ > 
hxF[A] is the number of F-structures on [n] hxed by F[cr] where a is a permutation 
of [n] with cycle type A, pa is the power sum symmetric function indexed by the partition 
A of n, dehned by px = PA 1 PA 2 • • •) and n\/zx is the number of permutations in &n of cycle 
type A (if A has ruj parts equal to i for each i then ^a = 2'^^m2\ ■ ■ ■), 

For any species F, we denote by F{x) the ordinary generating function for unlabeled 
F-structures; that is, 

00 

m ( 2 ) 

n=0 

where fn is the number of unlabeled F-structures of order n. Then 


F{x) 



Equivalently, the number of unlabeled F-structures of order n is 
seen directly by Burnside’s lemma (see section 3). 


^In many accounts of cycle indices our pi are taken simply as indeterminates. In particular, our pi is 
written as Xi in [2]. 

We note that the sum of the terms of degree n in Zp form the “characteristic” or “Frobenius image” of 
the representation of 6„ associated with the action of 6„ on F[n\ (see, e.g., [12, pp. 351-352 and 395-396]) 
and all of the operations on species that we discuss have analogues for representations of symmetric groups. 
This is one of the reasons why we consider cycle indices to be symmetric functions. 
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One of the most important species is the species En of n-sets, defined by 


En[A] 


{A}, if\A\=n 
0, otherwise. 


The cycle index of is the complete symmetric function hm defined by 


hji ^ ^ Xi^Xi2 ■ ■ ■ ^inl 


and also given by the formula 


Ahn ^ 


The special case Ei, the species of singleton sets, is denoted by X. It has cycle index 
Zx = Pi- 

Given species E and G we can combine them to get the sum E + G, the product EG, 
the composition E{G) (also denoted E o G), and the Cartesian product E x G, and these 
operations on species translate into operations on cycle indices. We refer the reader to [2, 
pp. 1-58] for details about these operations. 

For the sum, [E + G)[74] is the disjoint union of E[A] and G[A\. An FG-structure on 
the set A is obtained by partitioning A into disjoint subsets B and G (possibly empty) and 
taking an F-structure on B and a G-structure on G. 

An F(G)-structure on the set A is an F-structure of G-structures; more precisely, F(G)[A] 
is the set of triples {7r,a,/3), where vr is a partition of the set A, a is an F-structure on tt, 
and /3 is a set of G-structures on the blocks of tt. In particular, an F„(G)-structure on A is 
a partition of A into n blocks, together with a G-structure on each block. 

The Cartesian product is defined by (F x G)[A] = E[A\ x G[A]; thus an F x G-structure 
is a pair of structures on the same set. 

The corresponding operations for cycle indices are simple for the sum and product: Zpj^c = 
Zp A Zq and Zpc = ZpZq. The cycle index operation for composition of species is an 
operation on symmetric functions called composition or plethysm (see, e.g., [12, p. 447]). 
The composition of / and g, denoted by f[g] or f o g, may be defined by 


fig] = /(^(Pl,P2,P3,---,),^(P2,P4,P6,- ••),■■•); 


i.e., f[g] is obtained from / by replacing each pi with pi[g] = g{pi,P 2 i-iP 3 i, ■ ■ ■)■ Then Zp^c) = 
Zp[Zg\- In particular, since Ze 2 = h2 = \{p\ +P 2 ), we have ZE 2 [g] = lig"^ +P 2 [g])- The 
cycle index operation corresponding to the Cartesian product on species is an operation on 
symmetric functions called the Kronecker product, internal product, or inner product. The 
Kronecker product, denoted by *, is defined by p\*p^ = z\5^\p\ and linearity, or equivalently. 


A 






P\ 

Zx' 


Then Zp^^Q — Zp * Zq. 

As is customary in discussing species we will consider isomorphic species to be equal; for 
example, in equation (3) below the two sides are really isomorphic rather than equal. 
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3. Tanglegrams 


Let R be the species of (rooted) binary trees with labeled leaves and unlabeled internal 
vertices. A binary tree is either a single labeled vertex or an unlabeled root together with 
an unordered pair of binary trees. Thus R satisfies the equation 


R = X + E2{R), 


(3) 


so the cycle index satishes 

ZR = Pi + h2[Z (4) 
Terms of Zr can be computed fairly easily by successive substitution in (4), though there 
are other ways to compute them that are more efficient. (See, e.g., [10, Corollary Dl] and 
Section 6.) The first few terms of Zr are 

Pi + Qp? + \p^ + QpiP 2 + + (^^Pi + ^P2 + |P?P2 + ^P4^ + ■ ■ ■ 


It is easy to see from (4) that for evey power sum p^ that occurs in Zr^ n is a power of 2. From 
(4) we can also easily derive the well-known functional equation for the ordinary generating 
function R{x) (see [1, A001190]) 

R{x) = x + ]^(R{xf + R{x^)), 


but to count tanglegrams, we need the full cycle index. 

Now let T be the species of (labeled) tanglegrams. Since a tanglegram is a pair of binary 
trees with the same set of labels, T is the Cartesian product Rx R, so Zr = Zr^^r = Zr*Zr. 
Thus if 


Zr = 



(5) 


where the sum is over all partitions A, then Zr = 
tanglegrams with n leaves is 


E 

Ahn 


2:a 


'^x'f'xPx/zx, and the number of unlabeled 


( 6 ) 


A formula equivalent to (6) was given by Billey, Konvalinka, and Matsen [4, Theorem 1]; we 
will discuss their result further in Section 6. 

A tangled chain of length fc is a fc-tuple of binary trees sharing the same set of leaves. It 
is clear that the species of tangled chains is the /cth Cartesian power of R, so the number of 
unlabeled tangled chains of length k is 


E 

Ahn 




(7) 


as also shown by Billey, Konvalinka, and Matsen [4, Theorem 3]. 

It may be noted that (7) is an easy consequences of Burnside’s lemma: if a group G acts 
on a finite set S then the number of orbits is 


1 


g&G 


(8) 
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To derive (6) from Burnside’s lemma, we consider the action of ©„ on fc-tuples of labeled 
binary trees with leaf set [n]. A fc-tuple is fixed by a permutation if and only if all its entries 
are hxed, so the n\/zx permutations of cycle type A contribute {n\/z\)r\ to the sum (8). 
Thus the number of orbits is 




n\ ^' 2;a 

Ahn 


Ahn 


2:a 


4. Unordered tanglegrams 


To count unordered tanglegrams with species, we need another operation on species, inner 
plethysm, that is not as well known as the other operations. Inner plethysm is a kind of 
composition of species that bears the same relation to the Cartesian product that ordinary 
composition bears to the ordinary product. It is also closely related to the operation of 
functorial composition of species introduced in [5] and discussed further in [2, Section 2.2]. 
The term “inner plethysm” was introduced by D. E. Littlewood [11] for the corresponding 
operation on symmetric functions, and the species operation was introduced by L. Travis in 
his Ph. D. thesis [15], and we refer to his thesis for results about inner plethysm not proved 
here. 

There is no standard notation for inner plethysm, so we will introduce the notation F{G} 
for the inner plethysm of species, with the same notation for inner plethysm of symmetric 
functions. We will define here only the inner plethysm En{G}, which is the only case that we 
will need: for any finite set A, En{G}[A\ is the set of multisets of size n of elements of G[A\. 
We can define U„{G} in another way: The symmetric group ©„ acts on the elements of the 
nth Cartesian power G^^[A\ by permuting the coordinates, and the elements of E„{G}[A] 
are the orbits under this action. (The functorial composition of species is defined similarly, 
but with a set, rather than a multiset of elements of G[y4].) 

Inner plethysm of symmetric functions is determined by the following: 

(1) for fixed g, the map / ha f{g} is a homomorphism from the ring of symmetric func¬ 
tions with the usual product to the ring of symmetric functions with the Kronecker 
product 

(2) For a partition A of n and an integer k, let A^ denote the cycle type of the fcth power 
of a permutation with cycle type A. Then 



Qj^k ■ 


Pa 

2;a' 


Travis [15, Theorem 2.12] showed that for any species E and G, we have Zp^o} = Zp{Zg}- 
It is clear that the species of unordered tanglegrams is E 2 {R}, where R is the species of 
binary trees. So the cycle index for unordered tanglegrams is h 2 {Zp} = ^(pl + P 2 ){Zr\. 
Thus if Zp = z\ then the cycle index for unordered tanglegrams is 


1 

2 



and we obtain the number of unordered tanglegrams with n leaves by setting each px to 1 
in the sum of the terms of degree n. This formula for unordered tanglegrams can also be 
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derived directly from Burnside’s lemma, using the action of ©„ x ©2 on labeled tanglegrams, 
where ©2 acts by permuting the entries. 

Here are the hrst few values of the number of unordered tanglegrams with n leaves (see 
[1, A259114]): 


n 

1 2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

(In 

1 1 

2 

10 

69 

807 

13048 

269221 

6660455 

191411477 

6257905519 


Symmetric function computations were done with the help of John Stembridge’s Maple 
package for symmetric functions [13, 14], 

Similarly, Ek{R} is the species of unordered tangled chains of length k. 

5. Unrooted tanglegrams 

To hnd the cycle index for unrooted trees, we use a dissymmetry theorem, which reduces 
the enumeration of unrooted to the enumeration of several types of rooted trees, which can 
usually be counted by decomposing them. The basic dissymmetry theorem says that if A 
is a species of unrooted trees of some type. A* is the species of H-trees rooted at a vertex^, 
A~ is the species of H-trees rooted at an edge, and A*~ is the species of H-trees rooted at a 
vertex and incident edge (or equivalently, at a directed edge), then 

H + + (9) 

We give here a brief sketch of the proof of (9), referring the reader to [2, Section 4.1] for a 
more detailed discussion. Every tree has a unique “center”, which is a vertex or edge that 
is hxed by every automorphism of the tree. An unrooted tree may be identihed with a tree 
rooted at its center. To prove (9), we describe a bijection, equivariant with respect to the 
automorphism group of the tree, from the non-center vertices and edges of a tree to pairs 
consisting of a vertex and an incident edge. If n is a non-center vertex, we pair it with the 
hrst edge on the unique path from v to the center (this edge may be the center), and if e is 
a non-center edge, we pair it with the hrst vertex on the unique path from e to the center 
(this vertex may be the center). From the bijection just described, we get a bijection from 
A-trees rooted at a vertex or edge to A-trees rooted at a center vertex or edge (equivalent 
to unrooted A-trees) or at a vertex and incident edge. 

Now let U be the species of “unrooted binary trees”; that is, unrooted trees in which 
every vertex has degree one or three, the leaves (vertices of degree one) are labeled, and the 
internal vertices (of degree three) are unlabeled. (We are not including the tree with one 
vertex.) First we consider U-trees rooted at an edge e. Removing the edge e and rooting 
the remaining two trees at the vertices incident with e gives two rooted binary trees. Thus 
U~ = E 2 {R) = R — X. Similarly, we can remove the root edge from a U-tree rooted at a 
vertex and incident edge to obtain a pair of rooted trees, but in this case the rooted trees 
are ordered, so U*~ = R^. Finally, the U-trees rooted at a vertex may be rooted at either 
an internal vertex or a leaf. The species of U-trees rooted at an internal vertex is E^{R) and 


tree rooted at a vertex is formally an ordered pair (T,v), where T is a tree and n is a vertex of T. 
Trees rooted at edges etc. are defined similarly. 
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the species of U-trees rooted at a leaf is XR, so U* = E^^R) + XR. Thus (9) gives 

U + R^ = EsiR) +XR + R-X, 
and we obtain a formula for the cycle index of U, 

Eu = hslZji] + piZji + Zji — Z\ — Pi 

The hrst few terms of Zu are 

Qp? + + QpiP2 + ^pI + + Qp?P2 + + ^pI + ^P?j +■■■ 

Then the species of unrooted tanglegrams is the Cartesian product U x U, with cycle 
index Zu * Zu and the species of unrooted unordered tanglegrams is E 2 {U}, with cycle 
index h 2 {Zu}- The numbers bn of unrooted tanglegrams and c„ of unrooted unordered 
tanglegrams for small values of n are as follows: 


n 


2 3 4 5 6 

7 

8 

9 

10 

11 

12 

bn 


I 1 2 4 31 

243 

3532 

62810 

1390718 

36080361 

1076477512 

n 


2 3 4 5 6 

7 

8 

9 

10 

11 

12 

^71 

1 1 2 4 22 

145 

1875 

31929 

698183 

18056523 

538340256 


The sequence bn is [1, A259115] and the sequence Cn is [1, A259116]. 


6. The formula of Billey, Konvalinka, and Matsen 


Billey, Konvalinka, and Matsen [4, Proposition 4] proved (though they did not state it 
this way) that if A is a binary partition of n (a partition in which every part is a power of 
2 ) then the coefficient r\ of p\/zx in Zr is given by the simple explicit formula 

Z(A) 

= ]^(2(Ai + ■ ■ ■ + M{x)) — 1)5 (10) 

i=2 

where /(A) is the number of parts of A. (If A is not a binary partition then ta = 0.) They 
proved (10) by showing that the right side satishes the recurrence corresponding to (4). 

It would be nice to place (10) in a larger context, but I do not know how to do this. One 
possible approach is Lagrange inversion. If we let r{z) be the result of setting pi = z and 
Pi = 0 for i > 1 in Zr then, as noted in [4], r{z) is the exponential generating function for 
labeled binary trees, and (4) yields 

r{z) = z + r{zY/2, (11) 


which can be solved by Lagrange inversion to give 


n=l 


_^(2n-2\zn 

2"'“^n \ n — 1 y 


^ 1 ■ 3 ■ 5 ■ ■ ■ (2n 

71=1 



n\ 


( 12 ) 


which is what (10) gives for A = (1"). (Equation (12) can also be obtained directly by 
solving (11) to get r{z) = 1 — x/1 — 2z, and then applying the binomial theorem.) So we 
might hope that some generalization of Lagrange inversion might yield (10). There is a 
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Lagrange inversion formnla for composition of symmetric fnnctions, dne to Labelle [10] (see 
also [6]), and Labelle applies it, in his Corollary B, to (4), obtaining 


Zr = 


fci ,/ C 2,...=0 


^^ 1+^2 H — 

dp\^dp2^ ■ ■ ■ 


‘II 

i=l i=l 


{pi + P2if^ 

ki\ 


(13) 


However, it is not clear that (10) can be derived from (13); it is not even obvious from (13) 
that the only nonzero terms in Zr correspond to binary partitions. 

Another possible approach to deriving the formula for rx is that of Wagner [16], but I was 
not able to get it to work. 


Acknowledgments. I would like to thank Sara Billey, Gilbert Labelle, Erick Matsen, and 
Archie Pedantic for helpful suggestions on earlier versions of this paper. 
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